Overview
This vignette explains how to prepare your data for
SigFun. Both the streamlined (sig2Fun())
and stepwise (sigCor() + GSEA + plotting) workflows require
the same input format: a properly constructed
SummarizedExperiment (SE). This document
adds a small checklist, clear column requirements, and validation
helpers so your data passes seamlessly into either workflow.
Quick Checklist
Expression (
assays$abundance): numeric matrix genes × samples; no columns or rows should contain all values ofNA; zeros allowed; preprocessed (e.g., log‑TPM/CPM).RowData (
rowData): gene annotations with required columns:ensg_id,gene_symbol,gene_biotype.ColData (
colData): sample info where column names match expression column names. Include ≥1 signature column (numeric or binary 0/1). Add any other covariates as needed.Ontology (
t2g): a data.frame with columnsgs_nameandensembl_geneusing the same gene ID type asrowData$ensg_id(Ensembl IDs recommended). (See Ontology Database Setup for setup)
Input Data Requirements
SummarizedExperiment Structure
SigFun expects an SE with three essentials:
Assay: expression matrix (genes × samples), stored as
assays$abundanceRowData: gene annotations (must align row‑wise with the assay)
ColData: sample information (must align column‑wise with the assay)